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ABSTRACT 

The "Instructional Improvement Digest" communicates 
advisory information about practical courses of action that can be 
implemented by teachers and administrators to improve key areas of 
school instruction. The series digest topics draws upon inquiry 
associated with the Southwest Regional Laboratory for Educational 
Research and Development's Proficiency Verification Systems and 
Services and other pertinent research. The digest seeks to focus on 
matters of high priority in the conduct of current activities for 
instructional improvement. This article addresses the matter of how 
to use student test results for instructional purposes. It provides a 
simple and practical strategy for using test information sensibly. 
Interpreting test results, subtest labels, and test items are 
discussed. Good test consumerism requires judging test items 
according to the intention of instruction and interpreting scores 
according to their usefulness in instructional planning. 
(Author/PN) 



***** **************************************************************** 

* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 

**************** ***************************************************** 



"PERMISSION TO REPnOOUCE THIS 
MATERIAL HAS BEEN QRANTEO BY 




TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) " 



©WML 

INSTRUCTIONAL IMPROVEMENT DIGEST 



No. 6 



1982 



CD 
U-J 



-a 

-St 

Si 



TEST INTERPRETATION, MISINTERPRETATION, 
AND INSTRUCTIONAL PLANNING 



Teachers tre required to give achievement tests 
to students for many different purposes. The in- 
tention is always to help teachers and students. 
Whether teachers and students regard the effort as 
helpful, however, varies— some do and some 
don*t. One of the ways that everyone would like 
the results to be helpful is in teachers* instruc- 
tional planning. However, the relationship be- 
tween test results and instruction seems to be 
elusive. Is there a secret or mystique between the 
two? 

This article addresses the matter of how to use 
student test results for instructional planning pur- 
poses. But it does not imply that instructional 
planning can be or should be reduced to a 
mechanical routine. Such planning inherently 
must rely on the professional knowledge of the 
person involved: the teacher. What it docs provide 
IS a simple and practical strategy for using test in- 
formation sensibly. 



SMing to r 

Most people would agree that educators should 
know (I) what they are supposed to teach (inten- 
tion), (2) what materials and strategies they are go- 
ing to use (instruction), and (3) how they are going 
to identify student accomplishments (informa- 
tion). Seeing to it that these three components 
—intention, instruction, and information— work 
together is a desirable goal in planning and follow- 
ing curriculum. Though this plan looks good on 
paper, in practice many things can and do go awry 
in trying to make it work. 

Efforts toward instructional improvement 
typically begin by assuming that skills, materials, 
and assessment are coordinated. School staffs put 
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a lot of time and energy into devising and carrying 
out the improvement plan, and they expect to see 
improvement in students* test scores. But the 
results may not reflect the effort. Sometimes this 
is because the improvement effort itself was con- 
ceived hastily. Other times it is because staff ex- 
pectations were unreasonably high. Most often, 
however, a post-mortem reveals that the problem 
was a lack of coordination between the underlying 
components— intention, instruction, and informa- 
tion. A case s^udy may help to illustrate the 
situation. 



A Cast Study 

All the fifth-grade teachers m a school met to 
review their pupils* scores on the district's com- 
petency test. They wanted to use the information 
from the test to plan improvement for their in- 
structional program. Looking at results from the 
Composition part of the test, they noted that their 
students' performance m "Mechanics" was 
relatively low. They decided as a group to make 
Mechanics a priority area for improvement in the 
coming school year . 

So the leader of the group wrote the word 
"Mechanics" on the chalkboard and asked what 
skills should be included. The responses from the 
other teachers were global: capitahzation, punc- 
tuation, paragraph indentation, spelling, etc. At 
this point, the teachers could have proceeded in 
one of two ways: they could have decided to try to 
improve instruction relative to their list, or they 
could have paused to check their list against the 
district's curriculum guide for Grade 5 Composi- 
tion instruction. Had they taken the second path, 
they would have found the following in the 
district's guide: 
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1, Capiiaiizes the first letter m titles: Mrs.. 
Miss, Ms.. Mr., and Dr. 

2, Uses periods at the end of abbreviations 
and initials. 

3, Capitalizes the first, last, and important 
words in a title. 

4, Uses commas in quotations. 

5, Uses commas to separate items in a series. 

6, Uses commas between city and state. 



If the teachers had taken the first route, that is, 
deHned their intentions on the basis of the more 
global list, they could have been disappointed in 
their test scores for Mechanics at the end of the 
year since the instruction provided might not have 
matched the more specific skills assessed. If, on 
the other hand, the teachers had followed the 
more clearly marked instructional path, or at least 
had assured that these skills were included in 
instruction, they would be more apt to see im* 
provement in the test results. 

The point of the illustration is that it is very 
important to understand the specific instructional 
expectations in order to plan for effective im- 
provement. Not all districts and schools list their 
expectations in such detailed form as the example. 
In such cases the intentions and the instructional 
plans may both have to be inferred from the 



assessment information. However, there are pit- 
falls in trying to infer such meaning from tests. 

PItfallt In Interpreting Test Results 

Teachers face two major pitfalls in using test 
results for instructional planning: (1) interpreting 
subtest labels and scores and (2) interpreting 
individual test items. A Composition test, for ex- 
ample, may consist of subtests such as Sentence 
Processing, Paragraph Development, Mechanics, 
etc. Can we tell from these labels exactly which 
skills are assessed under each heading, wherever 
the heading is used? For instance, is **using com- 
mas** included under Mechanics in the third-grade 
test? Is it included under the same heading in the 
fourth-grade test? Does it appear at all in the fifth- 
grade test? By interpreting performance on in- 
dividual items, we can find out how the items were 
answered by one student, by a class, a grade-level, 
a school, and even an entire school district. These 
statistics are easy to get, but what do they tell us 
about how students write? Let*s take a closer look 
at both the labels and the items. 

Interpreting Subtest Labels 

Achievement tests for elementary school 
students often are organized by grade level. For 
example, there is a Grade 1 Mathematics test, a 
Grade 2 Mathematics test, etc. The same holds for 
other subject areas such as Reading, Composition, 
Science, etc. Commonly, each subject area test is 
composed of several subtests, for example: 
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Math^ffratlct 

Number Recognition 

Computation 

Measurement 

Problem Solving 

Reeding 

Decoding 
Vocabulary 

Sentence Comprehension 
Paragraph Comprehension 



Compoeltlon 

Sentence Processing 
Paragraph Development 
Mechanics 
Spelling 



Though the headings are usually the same for 
tests and subtests at each grade level, e.g., 
"Mathematics" and "Measurement/' the skills 
assessed may be very different. But relying on 
headings alone can be misleading in interpreting 
test results. The pitfall is overlooking the DIF- 
FERENCES and RELATIONSHIPS between the 
same labels at different grade levels. Here are 
some situations that illustrate the pitfall: 

Situation 1: Seme iabei but different, 
unrelated meenlnge 

Grade 3 Measercneat items a5sess 
recognition of the value of different 
money denominations— e.g., penny, 
nickel, dime, quarter, half-dollar, and 
dollar. 



Grade 4 Meateremat items assess 
knowledge of metric and nonmetric 
units — e.g., measuring length to the 
nearest centimeter, meter, inch, and foot. 
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Qearly, the Grade 3 and Grade 4 measurement 
skills are different and fairly unrelated. Moreover, 
these Grade 3 measurement skills are probably not 
prerequisite to the Grade 4 measurement skills. In 
other words, a student probably doesn't have to 
have the Grade 3 measurement skills (i.e.. money) 
in order to be successful in learning the Grade 4 
measurement skills (metric/nonmetric). For in- 
structional planning, this situation implies that the 
illustrated Grade 3 measurement skills do not have 
to be in place prior to teaching the Grade 4 
measurement skills. Where monetary and metric 
measures are taught is discretionary, .lut unless 
the instruction matches the ' ";essment and vice 
versa, little improvement is hKely to follow from 
the instructional plan. 

Situation 2: Same lal>el but eemi-related 
meaninge 

Grade 4 Mechanics items assess the use of 
apostrophes in singular possessive forms. 
For example: 



Jenny has an uncle Jenny lilies to visit 
her house. 

a. uncle's 

b. uncles 

c. unct'es 



Grade 5 Mechanics items assess the use of 
commas to separate items in a series, for 
example: 

Them were lions_ tigers, and elephants at 
the circus 

a. ! 

b. , 

c. 

d. none of these marl^s 



Qearly, the Grade 4 and Grade 5 Mechanics skills 
are different and neither is prerequisite to the 
other. However, both are probably required for 
the student to write a satisfactory story or com- 
position in the fifth grade. For planning instruc- 
tion, this situation implies that both skills 
probably need to be in place for a student to write 
a Grade S composition; however, the Grade 4 skill 
(apostrophes) does not necessarily have to be in 
place prior to teaching the grade S skill (commas). 
Rather, cither one of these skills can be taught 
first or the two may be taught concurrently. 



SItuatlcn 3: S«m« lab^l with dirtct, prt- 
rtqulslta mMnlngs 

Grade 3 MultipUcation items assess 
multiplication facts through 9. 

Grade 4 Mttltiplication items assess the 
multiplication algorithm involving up to a 
three-digit multiplicand and up to a two- 
digit multiplier. 



The Grade 3 and Grade 4 skill area labels are the 
same— i.e., '^multiplication." The Grade 3 multi- 
plication skill (multiplication facts) is a direct 
prerequisite to the Grade 4 multiplication skill 
(multiplication algorithm). A student should do 
well on the Grade 3 multiplication skills in order 
to be successful in learning the Grade 4 multiplica- 
tion skills. This situation implies that for students 
who are not skilled in Grade 3 multiplication, the 
teacher should plan additional instruction before 
teaching the multiplication skills designated for 
Grade 4. 

In summary, each subject area is represented by 
a subtest reHecting skills across grade levels that 
may be related in three different ways, each of 
which can present a pitfall to teachers. While we 
often presume that most situations are of the third 
type — i.e., the skills assessed at one grade level are 
direct prerequisites to the skills of the next grade 
level, that is not always so. 

One reason that the other two situations arc 
overlooked is that many of the record-keeping 
devices or charts tend to hide the relationships. 
For example, charts on which test performance is 
recorded by hand are usually simple grids. The 
system for using the grids is basically very easy to 
follow. Teachers mark the box with a slash (/) 
when the student is working on the skill, and cross 
the slash into an X when the student has 
''mastered" the skill. Usually skills are taught in 
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the same left-to-right order as listed in the chart. 
Two features of this system tend to conceal the 
relationships between skill ^'-.as and subtests: 



1. All the boxes in the typical grid are of the 
same size even though the listed skills are 
of different "sizes." For example, one 
box may represent the Grade 3 skill 
"recognizes pennies, nickels, dimes, and 
quarters." Another box of the same size 
on the Grade 3 grid may represent the skill 
"solves word problems involving two- 
digit addition or subtraction." 

2. A left-to-right check-off sequence on the 
grid tends to hide the interrelationships 
between skill areas and subtests. That is, 
the left-to-right sequence tends to imply 
that a left-side box is a prerequisite to the 
right-side box, which may or may not be 
true. 



Interpreting Test Items 

Items are the basic building blocks of tests. The 
information yielded by z test is only as useful as 
the information provided by each test item. So 
let's see wbai test items can tell us about students' 
skills. 

In the preceding section, we found important 
differences between subtests across grade levels. 
In the same way, two test items may have identical 
labels but still show important differences. 

This situation becomes clearer when we com- 
pare Items that assess instruction indirectly with 
direct assessments. Two examples illustrate impor- 
tant differences between these item types. 

Exannple 1: Grade 2 Measurement (Telling 
Time) 

s. Indirect instructional item (answered 
correctly by 60^o of Grade 2 students) 



Mr Baker washed his car The two clocks show you 


when he started and when he finished 


At what time 


did he finish'' 






START 


FINISH 








8-40 






8:30 


t 1 




800 






9-30 



b. Direct instructional item (answered 
correctly by SOVi, of Grade 2 students) 



Mark th© time 










6:10 






10:06 



The indirect instructional item contains 
extraneous m«iierial; to answer the question cor- 
rectly, the student doesn't need to see the START 
time. The direct item provides better planning in- 
formation. It eliminates unnecessary distractions 
and focuses instead on whether students have 
learned the skill of telling time. 



Example 2: Qracto 2 S«nt«nc« 
Cofnpr«h«naion 

a. ladirtct instructional item (answered 
correctly by 50^ of Grade 2 students) 



Mrs Brown ts in the city with a t>alloon 




b. Direct instructional item (answered 
correctly by 73 of Grade 2 students) 




Which sentence fits the picture? 

A. The turtle sits in a tree 

B. The rabbit sits on top of a tree 

C. The fox alts under a tree. 



The direct assessment item requires more reading 
than the indirect item. Though the mdircct item is 
brief, there are two elements that make it more 
difficult for children. First, the appellation 
**Mrs.** is typically part of oral vocabulary taught 
in Grade 2 but not a part of the reading 
vocabulary taught in that grade. Second, in prac- 
tice materials such as workbooks, students are 
often asked to mark the **one that is different. 
Thus, some stu^ its may automatically mark the 
picture without the balloon 

Aji incorrect response to the first item may in- 
dicate that a student (a) really hasn*t learned the 
target skill (that is, comprehending a written 
sentence), (b) doesn't understand the item format, 
or (c) is confused by the illustration, and so on. If 
students answer the second item incorrectly, at 
least teachers can be more confident that a student 
was truly weak in a tested skill and did not re- 
spond incorrectly because of the nature of the test. 
For planning purposes, the direct item provides in- 
formation that is amenable to instruction. The 
more indirect the item, the less clear the implica- 
tions for instructional planning. 



The Question is ''What is the Question?" 

In interpreting test results for planning instruc- 
tion, there is one basic question that must be 
consistently asked. It is: 



Arm these test qu99tlon9 Mommthing my stud^ntu 
have seen or pract/ced In the/r c/assroom work * 



If test information ts to be useful for instructional 
planning, it must be strongly related to actual 
classroo;n practice. The examples in the earlier 
part of this article—telling time and comprehend- 
ing sentences— illustrate how important this rela* 
tionship is. These examples show how some 
versions of a test item provide better information 
for planning instruction than other versions of the 
•*same** item. Consequently, good test consumer- 
ism or test-wiseness requires critical review and 
examination of the items m a test. Serious con- 
sideration of the basic question in the box above 
constitutes a * 'critical review Two variations of 
this basic question are examined in the remainder 
of this article. 



Variation 1. Arc these the same qaestiona I've 
been asking my studenU? The "same** question 
can be asiced in different ways. We as adults and 
teachers may see two questions as the "same'* 
question. However, children may sec them as dif- 
ferent questions. For example, two typical and 
roughly equivalent reading comprehension ques- 
tions are 



Question 1: What is this story about? 
QuMllon 2: What ts the main idea? 



Without explicit instruction, third- and fourth- 
grade students may well understand one question 
and not the other. That is. they may be able to tell 
you what the story is about but not be able to tell 
you the main idea because they arc not familiar 
with the phrase "main idea." inversely, 
students may not understand that "What is this 
story about?" is a request for a central theme, not 
just a detail of a story. 

As another example, two mathematically equiv- 
alent addition problems that occur frequently in 
tests are 

Question 1: 9 + 4 + 7«? 

Question 2: 9 

4 

7 



Children perform differently with horizontal and 
vertical formats in addition problems. Some 
children don*t see a horizontal format except on 
tests. (In fact, some people don't sec horizontal 
addition formats except on tests!) Some children 
may be able to correctly add 9, 4, and 7 but not 
realize that it is the same question in the horizontal 
format. The pomt, again, is to ask whether the test 
question looks like the instruction students have 
been used to seeing. 

Variation 2. Does the mix of test items accur- 
ately reflect the breadth and depth of my instmc- 
tlonai program? Do the kinds of skills covered in 
the test represent the mix of skills taught in my 
program? Does the number of items devoted to 
each skill category represent the relative impor- 
tance or amount of time spent in instruction? Just 
as you expect well-written unit tests or chapter 
tests to "mirror" the unit or chapter, so should 
you expect semester, year-long, or multi-year tests 
to mirror the instruction covered in the respective 
period of time. 
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Instructional Planning 



Two major steps arc involved in "using" test 
information for instructional planning. Together, 
these steps summarize most cf the implications of 
the preceding discussion. If the test content coor- 
dinates with your instructional program, proceed 
directly to Step 2. If the test content doesn't, you 
have three choices. 



Step 1. Analyze the test confent item by item 
and subtest by subtest to establish its 

conformity to your instructional 
program. 



a. Modify your instructional program so 
that It better matches the test. 

b. Disregard those portions of the test that 
do not match your instruction, 

€. Work toward coordinating the test and 
the instructional program (i,e,, work 
toward modifying the test and the instruc- 
tional program). 



Step 2* Look at the numt9r%. That is, look at 
student performance. 



If student performance is good, then 

a. Continue as before, or 

b. Consider doing less. Students may have 
already learned the test content through 
other instruction or through other means 
(e,g., home or TV). 

If student performance is low, then 

a. Consider doing more. Spend more time 
teaching the skill area, It*s likely that the 
content is simply not being "covered** 
adequately. 

b. Consider doing instruction differently. 
Change teaching strategies or materials. 



c. Consider doing less. It may be that the 
content area is not "worth" the instruc- 
tional effort, given other instructional 
needs. 

d. Continue as before. It may be that the 
content is dependent upon some other 
content that wasn't taught adequately. 
Thus, attending to teaching the related or 
prerequisite content may be what is needed. 

Two observations are in order regarding the 
preceding choices. One is that we must exercise 
care when deciding to "raise*' low scores. Raising 
a score from 80 correct to 85 correct may well 
take more instructional effort and time than rais- 
ing a score from 35^i to 70^ correct. Since a 
score of 35 basically represents "no knowledge," 
the job of moving from 35 to 70 represents the job 
of teaching something to somebody who doesn't 
know very much about the something to begin 
with. In other words, this is a fairly typical in- 
structional job. 

On the other hand, raising a score from 80 to 85 
is an effort in "fine tuning." A score of 80 
represents a fair amount of knowledge. Raising 
the score to 85 may mean removing careless errors 
from the performance. For example, "teaching" 
students to be more careful in the long division 
process is fine tuning and is different than 
teaching them the process. Trying to remove the 
arithmetic errors from the long division process 
may take more time than teaching the process 
itself. (Indeed, just getting students to use long 
division in other applications can provide practice 
in fine tuning.) 

The second observation is that all four choices 
are really dependent upon knowing the substance 
and structure of instruction. That, after all, is the 
"secret" of instructional planning. Test labels and 
scores should support this understanding of in- 
struction; they can be interpreted only relative to 
the substance and structure of the instructional 
program. In short, teachers should base their in- 
terpretation of test information on what they 
know best— their mstructional program. 



Summary 

The basic lesson in using test information for 
instructional planning is that we must get behind 
the labels in order to intepret tests and test items 
properly. If tests are used to provide information 
on the effectiveness of instruction or a school 
improvement effort, then the items represent a 
concrete and functional definition of the intention 
of instruction. It is, therefore, extremely impor- 
tant that the items chosen are coordinated with the 
intention of instruction. 

Going only by general labels such as 
"Measurement" or "Sentence Comprehension" 
will not assure the desired results. All tests are not 
the same, despite the fact that they may carry the 
same labels. Good lest consumerism, i.e., test- 
wiseness, requires comparison shopping for tests 
tnd correct interpretation of test scores. Items 
must be judged according to the intention of 
instruction, and scores must be interpreted 
according to their usefulness in instructional plan- 
ning. Doing anything less will yield results that 
don't reflect the professional time, effort, and 
commitment put into an instructional program. 
They won't show fully what students, teachers, 
and districts have accomplished. 

—George Behr 
Senior Member of the Professional Stqff 
SWRL Educational Research and Development 



Note: Content of this article is drawn from « series of 
technical reports on student accomplishment m- 
formation systems, written by Aaron Buchanan, 
Patricia Milazzo, and Richard Schutz. 



Readers ' comments art always wtlcom: 



0 



